Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ast::Lit::kind #101885

Closed
wants to merge 3 commits into from
Closed

Conversation

nnethercote
Copy link
Contributor

This is an alternative to the "Remove token::Lit from ast::Lit commit in #101562.

r? @petrochenkov

@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Sep 16, 2022
@rustbot
Copy link
Collaborator

rustbot commented Sep 16, 2022

Some changes occurred in src/librustdoc/clean/types.rs

cc @camelid

Some changes occurred in src/tools/rustfmt

cc @rust-lang/rustfmt

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 16, 2022
@nnethercote
Copy link
Contributor Author

nnethercote commented Sep 16, 2022

The best thing about this: ast::Lit shrinks from 48 bytes to 20 bytes. Though that's more than we need, going below 32 bytes doesn't currently provide any extra benefit. And couple of places in the code get a little cleaner.

The downside is that every literal has to be processed at least twice: once during parsing to detect errors (where we discard the converted semantic literal) and again during lowering to HIR. On some benchmarks this is noticeable:

Benchmark Profile Scenario % Change Significance Factor?
deunicode-1.3.1 check full 17.39% 86.93x
coercions check full 3.09% 15.43x
unicode_categories-0.1.1 check full 1.24% 6.18x
pest-2.1.3 check full 0.63% 3.17x
deep-vector check full 0.51% 2.53x
unicode-normalization-0.1.19 check full 0.31% 1.53x
ctfe-stress-5 check full -0.31% 1.55x

deunicode-1.3.1 is interesting. It uses include_bytes! to create a 400 KB byte vector. With the old code there was no unescaping, and with the new code it gets unescaped once, which accounts for the big slowdown.

The code is a bit rough, with some njn comments on things that need cleaning up. Also, I haven't converted clippy code yet. And I haven't removed the symbol_unescaped field from ast::StrLit, though that will be easy.

I'm not sure if this is a good idea, but I thought I'd do it to help the discussion in #101562.

@rust-log-analyzer

This comment has been minimized.

@petrochenkov
Copy link
Contributor

The downside is that every literal has to be processed at least twice: once during parsing to detect errors (where we discard the converted semantic literal) and again during lowering to HIR.

The goal of this change is exactly to stop semantically processing the literals during parsing (parsing Rust proper syntax, not syntaxes introduced by macros), and doing it only if they reach HIR, similarly to what float literals do now.

macro_rules! m { ($tt:tt) => () }

m!("str"suffix); // OK right now

#[cfg(FALSE)]
fn f() {
    "str"suffix; // ERROR, but probably shouldn't be
                 // will be OK if semantic checking of literals is delayed until HIR
}

fn main() {
    "str"suffix; // ERROR, expected semantic error
}

Semantic literals are sometimes needed before lowering, typically in attributes.
I guess the simplest solution is to just continue eagerly converting them to semantic form when parsing attributes specifically.

If this is done (together with moving spans "upwards" from #101562), then most of AST will basically keep token::Lits directly and ast::Lit will be eliminated.

deunicode-1.3.1 is interesting. It uses include_bytes! to create a 400 KB byte vector. With the old code there was no unescaping, and with the new code it gets unescaped once, which accounts for the big slowdown.

include_bytes is kind of a separate issue (#65818), we need to figure out how to process it without extra conversions anyway.

@nnethercote
Copy link
Contributor Author

The goal of this change is exactly to stop semantically processing the literals during parsing (parsing Rust proper syntax, not syntaxes introduced by macros), and doing it only if they reach HIR, similarly to what float literals do now.

I tried removing the check-and-discard code from Lit::from_token:

// We need to do the conversion to detect errors, even though we
// discard the result.
_ = LitKind::from_token_lit(token_lit)?;

and I got tons of ui test failures. But that's likely because I have lots of LitKind::from_token_lit(x).unwrap() calls in the places where the semantic literals are needed before lowering. I guess I could make those places fallible and then hopefully things will work out and the double processing will be avoided.

Semantic literals are sometimes needed before lowering, typically in attributes. I guess the simplest solution is to just continue eagerly converting them to semantic form when parsing attributes specifically.

Yes, it's a tiny fraction of the cases, so shouldn't matter for performance.

If this is done (together with moving spans "upwards" from #101562), then most of AST will basically keep token::Lits directly and ast::Lit will be eliminated.

I noticed that too. 👍 ast::LitKind will still exist.

It sounds like you are more in favour of this PR's approach than the first commit in #101562?

@petrochenkov
Copy link
Contributor

It sounds like you are more in favour of this PR's approach than the first commit in #101562?

I need to investigate this a bit, I just don't have time right now.
For now I suggest just landing #101562 without ceab0b4.

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 19, 2022
@bors
Copy link
Contributor

bors commented Sep 19, 2022

⌛ Trying commit c5a1b53e13f7ded9f8e9a269dc3c8a865e977e38 with merge d39d569d36b74d4b16fe4a1451452b23e16ae7ff...

@nnethercote
Copy link
Contributor Author

@petrochenkov: I have finished this. I haven't done StrLit, that can be done in a later PR. I also haven't moved the Span out of ast::Lit.

The places that need the semantic literal prior to lowering are a little more complicated in their error handling, and a few error messages have changed, but they are all quite obscure ones. Local measurements show that it is now performance-neutral, except for deunicode-1.3.1.

@bors
Copy link
Contributor

bors commented Sep 19, 2022

☀️ Try build successful - checks-actions
Build commit: d39d569d36b74d4b16fe4a1451452b23e16ae7ff (d39d569d36b74d4b16fe4a1451452b23e16ae7ff)

@rust-timer
Copy link
Collaborator

Queued d39d569d36b74d4b16fe4a1451452b23e16ae7ff with parent efa717b, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d39d569d36b74d4b16fe4a1451452b23e16ae7ff): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
0.5% [0.3%, 0.7%] 6
Regressions ❌
(secondary)
1.2% [1.2%, 1.2%] 2
Improvements ✅
(primary)
-0.3% [-0.5%, -0.2%] 5
Improvements ✅
(secondary)
-0.4% [-0.8%, -0.2%] 12
All ❌✅ (primary) 0.1% [-0.5%, 0.7%] 11

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.2% [3.0%, 6.3%] 4
Improvements ✅
(primary)
-1.1% [-2.1%, -0.6%] 11
Improvements ✅
(secondary)
-2.9% [-5.1%, -1.4%] 5
All ❌✅ (primary) -1.1% [-2.1%, -0.6%] 11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.3% [2.3%, 2.3%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.5% [-4.5%, -2.5%] 2
All ❌✅ (primary) - - 0

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 19, 2022
@rust-log-analyzer

This comment has been minimized.

@nnethercote
Copy link
Contributor Author

Some small performance ups and downs, but overall it's roughly neutral, as expected.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Sep 19, 2022
Because they are sometimes hot and each have a single call site.
This will become interesting in the next commit.
This shrinks `ast::Lit` kind considerably. The conversion to semantic
literals now happens at AST lowering time, with a few exceptions where
the literal values are needed before that.
@bors
Copy link
Contributor

bors commented Sep 21, 2022

☔ The latest upstream changes (presumably #101558) made this pull request unmergeable. Please resolve the merge conflicts.

@petrochenkov
Copy link
Contributor

MetaItem and friends are used in both HIR and AST (I'd say primarily in HIR rather than in AST).
So keeping a lowered literal there is quite reasonable.

Maybe we should keep ast::Lit as is and apply the ast::Lit -> token::Lit change only to expressions (ExprKind::Lit).
This will also be a minimal change that can go to lang team for approval (not checking literals in configured out code is a language change).

(If there are ways to make literal parsing in attributes more lazy too, then we can try them later, probably without further lang team approval.)
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 7, 2022
@nnethercote
Copy link
Contributor Author

We seem to be in agreement that ast::Lit should be trimmed down.

  • In Shrink ast::Expr harder #101562 I removed the token_lit field, which is an "early lowering" approach, similar to what we currently do.
  • In this PR I removed the kind field, which is the "late lowering" approach.

You don't seem happy with either approach, and now you're suggesting a hybrid approach where some literals (in attributes) get early lowering and some (in expressions) get late lowering. I think this would be a clumsy mix that is worse than the other two approaches.

In fact, the "late lowering" approach above is also a hybrid approach, because some early lowerings are required to process attributes. It's just that the early-lowered literals aren't stored.

Therefore, I think the token_lit removal from #101562 is best. I don't see much benefit of late lowering. And token_lit removal doesn't change any existing behaviour.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 10, 2022
@petrochenkov
Copy link
Contributor

And token_lit removal doesn't change any existing behaviour.

I do want to change the language behavior though, that's what the removed FIXME is about, AST shrinkage by itself is a secondary goal.

You don't seem happy with either approach, and now you're suggesting a hybrid approach where some literals (in attributes) get early lowering and some (in expressions) get late lowering. I think this would be a clumsy mix that is worse than the other two approaches.
In fact, the "late lowering" approach above is also a hybrid approach, because some early lowerings are required to process attributes. It's just that the early-lowered literals aren't stored.

Literals in attributes (Attribute/AttrItem) are not lowered, they are lowered in MetaItems which are not even stored in AST and by themselves are a lowered semantic representation of built-in attributes.
Two kinds of lowering in question (AST -> HIR and Attribute -> MetaItem) just don't match perfectly with each other (MetaItems are occasionally needed before HIR, and in other cases not needed at all?), but otherwise it looks quite logical to me - semantic representation needs semantic literals, syntactic representation can use token literals.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 10, 2022
@nnethercote
Copy link
Contributor Author

What's the benefit of the proposed language change? Can you give an example?

@petrochenkov
Copy link
Contributor

  • Better consistency between arguments of fn-like and attribute macros (Remove ast::Lit::kind #101885 (comment)), better consistency with float literals which are checked late.
  • Avoiding doing unnecessary work, literals are not converted to semantic form if that form is not used, AST size shrinking also fits into this.

@nnethercote
Copy link
Contributor Author

We ended up taking a different approach in #102944.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants